Motorola Center for Communications
Home People
Projects
Lectures Technology Transfer News




Audio Visual Interactions in Multimodal Communications

Abstract

Multimodal signal processing is more than simply "putting together" text, audio, images and video; it is the integration and interaction among these different media that creates new systems and new research challenges and opportunities.  Unimodal analysis of signals can deliver acceptable performance levels only in benign situations; the performance decreases rapidly when countermeasures are taken.  For example, person authentication systems useful in security, access control and surveillance applications do not perform well when subjects age, when the video resolution is inadequate, or poor lighting conditions are present.  Many of these difficulties can be overcome by adding an audio signature along with the video.

In multimodal communications where humans speech is involved, audio-visual interaction is particularly significant.  Human perception of speech is bimodal in that acoustic speech can be affected by visual cues from lip movement.  Due to the bimodality in speech perception, audio-visual interaction is an important design factor for multimodal communication systems, such as video telephony and video conferencing.  A prime example of this interaction is lip or speech reading.  It is used by the hearing-impaired for enhancing their speech understanding capability but also by every normal hearing person to some extent, in particular in noisy environments.

One key issue in bimodal speech analysis and synthesis is the establishment of the mapping between acoustic and visual parameters.  A novel approach for establishing this mapping was developed during our previous funding period.  Our current work addresses two inter-related problems.  First, the synthesis of articulatory parameters for an MPEG-4 facial animation model is being considered.  Second, we are concerned with the task of robust speech recognition.  Fusing these two areas will impact the fields of very low bit-rate coding of speech and images, speech and text driven facial animation parameters, speech and text driven facial animation of synthetic actors (i.e. avators) and audio-visual speech recognition.

Students

  • Jay Williams (Ph.D - June, 2000)
  • Zhilin Wu
  • Petar Aleksic

Publications

  1. P. S. Aleksic and A. K. Katsaggelos, "Comparison of Low- and High-level Visual Features for Audio-Visual Continuous Automatic Speech Recognition," submitted for publication, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2004.
  2. Z. Wu, P. S. Aleksic, and A. K. Katsaggelos, "Inner Lip Feature Extraction for MPEG-4 Facial Animation," submitted for publication, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2004.
  3. P. S. Aleksic and A. K. Katsaggelos , "An Audio-Visual Person Identification and Verification System Using FAPs as Visual Features," Workshop on Multimedia User Authentication, Santa Barbara, California, December 2003.
  4. P. S. Aleksic and A. K. Katsaggelos, "Speech-to-Video Synthesis Using MPEG-4 Compliant Visual Features," IEEE Transactions on Circuits and Systems for Video Technology: Special Issue on Audio and Video Analysis for Multimedia Interactive Services, accepted for publication, February 2004.
  5. P. S. Aleksic and A. K. Katsaggelos , "Speech-to-Video Synthesis Using Facial Animation Parameters," International Conference on Image Processing (ICIP), Barcelona, Spain, September 2003. View Document
  6. P. S. Aleksic and A. K. Katsaggelos, "Product HMMs for Audio-Visual Continuous Speech Recognition Using Facial Animation Parameters," International Conference on Multimedia and Expo (ICME), Baltimore, July 2003. View Document
  7. P. S. Aleksic, J. J. Williams, and A. K. Katsaggelos, "Speech-to-Video Synthesis Using MPEG-4 Compliant Visual Features," 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), London, April 2003. View Document
  8. P. S. Aleksic, J. J. Williams, Z. Wu, and A. K. Katsaggelos, "Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features," EURASIP Journal on Applied Signal Processing, Special Issue on Joint Audio-Visual Speech Processing, vol. 2002, no. 11, pp. 1213-1227, November 2002. View Document
  9. A. K. Katsaggelos, P. S. Aleksic, "Audio-Visual Interaction in Multimedia Communications," Proceedings of International Telecommunications Conference, pp. 47-52, Santa Rita do Sapucai, Brazil, October 2002.
  10. Z. Wu, P. S. Aleksic, and A. Katsaggelos, "Lip Tracking for MPEG-4 Facial Animation," International Conference on Multimodal Interfaces (ICMI), pp. 293-298, Pittsburgh, October 2002. View Document
  11. P. S. Aleksic, J. J. Williams, Z. Wu, A. K. Katsaggelos, "Audio-Visual Continuous Speech Recognition Using MPEG-4 Compliant Visual Features," International Conference on Image Processing (ICIP), pp. 960-963, Rochester, NY, September 2002. View Document
  12. P. S. Aleksic, J. J. Williams, Z. Wu, and A. K. Katsaggelos, "Audio-Visual Continuous Speech Recognition Using Mpeg-4 Compliant Visual Features," Defense Advanced Research Project Agency (DARPA) Multimodal Speech Recognition Workshop, N.C. A&T State University, June 2002.
  13. J.J. Williams, A.K. Katsaggelos and M.A. Randolph, "A Hidden Markov Model Based Visual Speech Synthesizer," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, June 5-9, 2000.
  14. J. J. Williams, J. C. Rutledge, D. C. Garstecki, and A. K. Katsaggelos, "Frame Rate and Viseme Analysis for Multimedia Applications,'' Journal of VLSI Signal Processing Systems, vol. 23, nos. 1/2, pp. 7-23, Oct. 1998.
  15. J. J. Williams, J. C. Rutledge, D. C. Garstecki, and A. K. Katsaggelos, "Frame Rate and Viseme Analysis for Multimedia Applications,'' Proc. IEEE First Workshop on Multimedia Signal Processing, pp. 13-18, Princeton, NJ, June 23-25, 1997. View Document

Theses

  1. J.J. Williams, "Speech-to-Video Conversion for Individuals with Impaired Hearing," Ph.D. Thesis, Department of Electrical and Computer Engineering, Northwestern University, June 2000. View Document

More Information...







Northwestern University
Send questions, comments to Webmaster@dimitra.ece.northwestern.edu
Motorola